The Begging

$\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;$ Kajetan Kubik

Business objective: predict total sales at the basic of historical data.

For validation, i try to predict sales, for which I already have information.


Reading the data


I take data from Kaggle Competition Predict Future Sales. Data contains information about sales from 60 shops.

Data contains 5 files:




Exploring the data


Let's starts from choosing only one shop, because it more realistic situation, when one shop will ask somebody to predict sales for them instead for bunch of stores.

I choose shop with id 31, which is Semenovskiy Shopping & Entertainment Center in Moscow.

Since i want to predict only on basic on previous sales, i can drop item_price columns, and of course shop_id.

I change date format, and also add column, in which i store date without days(so basicly copy of date_block_num column)

Lets look how sales look in each month.

Marked areas are December. Despite fact, that Russia have their Christmas in January, the peak of sales are near our.

Let's check witch item people buy in this shop most often.

We can observe that first item is 16 times more popular then second one. It happened to be... foil bag.

How other 100 most popular items sale looks like.


Fast outliers


Let's fast check if there is any outliers in column item_cnt_day.

There were 2 sales deviating from the norm, so i just drop them out.


Total sales prediction


Mathematical tools


So it is obvious that data have some patterns, we need only find them! Let check if difference have some pattern.

This plot shows how difference in total sales looks like. We can observe that for sure some patterns there exists. Let's use double exponential smoothing, which is mathematical tool which employs a level component and a trend component at each period. It uses two weights, (also called smoothing parameters), to update the components at each period. The double exponential smoothing equations are as follows:

$L_t = \alpha V_t + (1 - \alpha)[L_{t-1} + T_{t-1}]$

$T_t = \beta [L_{t} - L_{t-1}] + (1 - \beta)T_{t-1}$

$\Delta V_t = L_{t-1} + T_{t-1}$

where:

$L_t$ is level at time t, $T_t$ is trend at time t, $V_t$ real value at time t, $\Delta V_t$ predicted value step ahead t and $\alpha$ and $\beta$ are smoothing parameters.

At above graph we can observe how this mathematical function looks with different parameters. The best one is when $\alpha = 0.9$ and $\beta = 0.02$, and according to this prediction, we can expect slightly above 7k sales in November 2015.


Main method: Prophet


The main tools for prediction that i choose is fbProphet, the Facebook prediction library.

Despite predicting value for select amount of time, model also predict confidence interval (in my case 95%).


prophet_model = Prophet(interval_width = 0.95, daily_seasonality=True)
prophet_model.fit(prophet_df)

future_dates = prophet_model.make_future_dataframe(periods = 30)
forecast = prophet_model.predict(future_dates)

So, let's check what is total predicted sales in October, how wrong the model was, and predict sales in November, which is our final goal.

Why not 'normal' regression.

If we try to predict next month, December, our model give pretty good prediction


Bonus


Predicting items sales using LSTM


If we have already so precise data, we could try do the thing, that this data set was created for, so predict total sales in November specific items. For this task i choose LSTM(Long-Short Term Memory), which of i understand the concept of, but i don't know much more.

Let's start from making pivot table, where rows are items, and columns are 34 months, where 0 is 2013 January and 33 is 2015 October, and cell value is total sales of item I in month M.

Let's remove item_id column, because we want to predict at the basic of historical total sales.

To train our model, we will use data from first 33 months, and the least one, October 2015, will be the target. I don't split data into train and validation, because build in option(size of validation = 30% of all data) do it for me.

My LSTM model will have 4 layers:

Training: (8 epochs, batch_size = 128)


Check the results


Let's check how looks prediction for few choosen items.